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Abstract: Embedding data-aggregation capabilities into sensor nodes of wireless networks 
could save energy by reducing redundant data flow transmissions. Existing research 
describes the construction of data aggregation trees to maximize data aggregation times in 
order to reduce data transmission of redundant data. However, aggregation of more nodes 
on the same node will incur significant collisions. These MAC (Media Access Control) 
layer collisions introduce additional data retransmissions that could jeopardize the 
advantages of data aggregation. This paper is the first to consider the energy consumption 
tradeoffs between data aggregation and retransmissions in a wireless sensor network. By 
using the existing CSMA/CA (Carrier Sense Multiple Access with Collision Avoidance) 
MAC protocol, the retransmission energy consumption function is well formulated. This 
paper proposes a novel non-linear mathematical formulation, whose function is to 
minimize the total energy consumption of data transmission subject to data aggregation 
trees and data retransmissions. This solution approach is based on Lagrangean relaxation, 
in conjunction with optimization-based heuristics. From the computational experiments, it 
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is shown that the proposed algorithms could construct MAC aware data aggregation trees 
that are up to 59% more energy efficient than existing data aggregation algorithms. 

Keywords: Data aggregation; MAC aware data aggregation routing; CSMA/CA; 
Lagrangean relaxation; wireless sensor networks 



1. Introduction 



Wireless sensor networks (WSNs) have been blooming recently, which can probe and collect 
environmental information, such as temperature, atmospheric pressure, and irradiation to provide 
ubiquitous sensing, computing, and communication capabilities. A WSN has two important and 
interesting characteristics that are different from traditional wireless networks. First, after the event 
occurs, multiple sensors nodes (denoted as data source nodes) around this event will sense the event, 
and then send the data back to one sensor node (denoted as sink node). Hence, communication mode in 
WSN occurs from multiple data source nodes to one data sink node. This is a type of multipoint-to- 
point, rather than the traditional point-to-multipoint (i.e. multicast) communication in wireless 
networks. For example. Figure 1 shows a data aggregation tree from three data source nodes to one 
sink. This data aggregation tree is a type of reverse-multicast tree. Second, energy saving is possible at 
the nodes on the data aggregation tree because intermediate nodes on the data aggregation tree could 
receive redundant data from the data source nodes. In order to avoid transmitting useless, redundant 
data back to the sink, the intermediate nodes could save energy by collecting and processing data 
before transmission and prevent disconnected networks due to rapid energy depletion of sensors. This 
type of data aggregation capability has been put forward as particularly useful for routing, in terms of 
energy consumption in WSN [2]. 



Figure 1. Data aggregation in MAX. 




There are several data aggregation schemes, and in addition to reducing redundant transmissions, 
other aggregation schemes could compute maximum values (MAX), minimum values (MIN), or 
summations (SUM) of the collected data. For example, in Figure 1, an event in sensing range of data 
source nodes nu n2, and ^3 is probed for temperature (60, 65, and 63T, respectively), and the MAX 
temperature is then sent back to the sink node S. If node ns could aggregate (i.e. MAX = 65°F) these 
data before returning it to the sink, the total number of transmission times for node ns could be reduced 
from three to one. 
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Since it is almost impossible to replace the battery in a sensor node, power efficient communication 
in WSN plays a crucial role. In data aggregation routing, the key issue is how to construct the reverse 
multicast tree in such a way to save the total energy consumption. Most existing research literatures 
construct the tree by only considering the data aggregation aspect [2,6]. The basic idea of these data 
aggregation aspect algorithms is trying to maximize the times of aggregation to reduce the number of 
transmissions. However, there remains one issue important to the construction of a data aggregation 
tree, the MAC layer retransmission issue. 

In WSNs, any sensor nodes within another's transmission range trying to transmit simultaneously 
would result in collision. In addition, two nodes that are not within each other's transmission range 
trying to simultaneously transmit to the same node might also incur collision. This is well known as 
the hidden-node problem. Because of hidden-node problem, the interference range is larger than the 
transmission range in wireless communications. In Figure 2(a), shows that even though the 
transmission radius of nodes ni and do not overlap, collision still occurs at the receiver (node S) 
when they transmit at the same time. When collision occurs, retransmission is required to ensure the 
data is successfully received. These retransmissions incur additional energy consumption, which will 
jeopardize the advantages of data aggregation. Data retransmission times are determined by the total 
number of sensor nodes whose transmission radius covers the receiver (or equivalent to the total 
number of sensor nodes within each other's interference range). In other words, the more flows are 
aggregated, the higher the probability that the senders will incur data retransmission. Hence, there is a 
tradeoff between data aggregation and retransmission. Good data aggregation tree should address data 
aggregation and MAC layer retransmission at the same time. 

Figure 2. Tradeoff between retransmission and data aggregation. 



— Link on aggregation tree 
Sink Node Sink Node 




Total Energy Consumption jotal Energy Consumption = 

= 5.05 milliwatts 4.74 milliwatts 

(a) Without addressing retransmission (b) With addressing retransmission 



Figure 2 gives an illustrative example to show the tradeoff between the data aggregation and 
retransmission, where nodes nu n2, and ns are the data source nodes. Without considering data 
collision, the optimal aggregation tree is as shown in Figure 2(a). Note that when an intermediate node 
aggregates more data, a greater number of collisions would occur at the intermediate nodes, which 
results in additional energy consumption. Node S, the receiver of the three children nodes, will suffer 
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significant collisions that results in more retransmission times. With considered collision effects, a 
more energy efficient data aggregation tree is as shown in Figure 2(b). In this figure, by reducing the 
transmission radius of node ni, and change its routing assignment to node n4, the total energy 
consumption could be reduced. Even though there is extra energy consumption at node n4, there are 
only two children nodes at node and thus, the retransmission times caused by collision could be 
significantly reduced. Hence, the energy consumption associated with retransmission from collisions 
should be carefully addressed in WSN. This example also shows that a good tradeoff between data 
aggregation and retransmission is facilitated by intelligent transmission radius and routing 
assignments. The energy consumption function (including transmission power and retransmission 
power), as shown in Figure 2, is calculated by its objective function (IP), as described in Section 3. 

This paper discusses the impacts of retransmission on data aggregation, and proposes a MAC aware 
energy efficient data aggregation algorithm to consider a tradeoff between the benefits of data 
aggregating and data retransmission costs in WSN. To the best of our knowledge, there is no literature 
that addresses the cross-layer (layer 2 and layer 3) MAC aware data aggregation routing algorithm in 
WSNs. This paper proposes an optimization-based heuristic algorithm to solve the MAC aware 
energy-efficient data aggregation routing problems (MAC-DAR) based on the CSMA/CA protocol in 
WSNs. The problem is first formulated as a nonlinear programming problem, where the objective 
function is to minimize total energy consumption from data transmissions and retransmissions. The 
Lagrangean relaxation scheme in conjunction with the optimization-based heuristic algorithm is 
proposed to solve this problem. From the computational experiments, the proposed solution approach 
outperforms the conventional non-MAC aware data aggregation heuristics. In addition, the proposed 
nonlinear programming formulation for the MAC-DAR problem is based on the existing CSMA/CA 
protocol, and thus, our algorithm could be deployed in the wireless sensor network, without the 
necessity of modifying the MAC protocol in WSNs. In summary, besides better solution quality, our 
proposed approach could be easily deployed in WSNs without changing the existing CSMA/CA 
protocol. 

The remainder of this paper is organized as follows. Section 2, surveys existing related works on 
data aggregation routing and MAC layer protocols in WSNs. In Section 3, mathematical formulation 
of the MAC-DAR in WSNs is proposed. In Section 4, solution approaches, as based on Lagrangean 
relaxation are presented. In Section 5, heuristics are developed for calculating a good primal feasible 
solution. In Section 6, computational results are reported. Finally, Section 7 concludes this paper. 

2. Related Works 

Existing researches have been conducted to address pure data aggregation routing problem in WSN. 
In [2], they devise three interesting suboptimal aggregation heuristics, Shortest Paths Tree (SPT), 
Center at Nearest Source (CNS), and Greedy Incremental Tree (GIT) for data centric routing 
problems. In [6], mathematical formulations for data aggregation problem in WSN are well 
formulated, and an optimization-based heuristic algorithm is then proposed to tackle the problem. In 
[5], they address latency issues in constructing a minimum energy aggregation tree, and propose the 
CCA algorithm, which includes the basic idea of a balanced tree to simultaneously minimize energy 
and latency issues. 
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Several papers have discussed MAC layer protocol in ad-hoc and sensor networks [7-9]. X. H. Lin 
[9] enhanced the standard IEEE 802.1 1 MAC protocol by improving the handshake and power control 
mechanisms. W. Ye [7,8] reviewed several MAC protocols, and discussed design tradeoffs on energy 
efficiency and data transmissions. W. Ye proposed S-MAC protocol to fit the energy-efficient 
requirements for sensor networks, which is also a variation of a CSMA-like protocol that needs extra 
messages for transmitting data. 

Several works have proposed cross-layer algorithms to deal with retransmission issues caused by 
collisions in wireless sensor networks. In [10], instead of dealing with the retransmission issue 
directly, they assign sensor nodes within each other's interference range, which have different 
channels to circumvent collision problems. They proposed integrated channel assignments and data 
aggregation routing algorithms in WSNs. In [11,12], the authors proposed an interesting MAC layer 
anycasting mechanism and randomized waiting at the application layer, to facilitate data aggregation 
spatially and temporally in structure-free sensor networks. They address the collision problem by 
proposing a modified CSMA/CA protocol and randomized waiting scheme to reduce the number of 
retransmissions. 

3. Problem Formulation 

A MAC-DAR in WSNs is modeled as a graph, in which sensors are represented as nodes, and the 
arc connecting the two nodes indicates that one node is within the other's transmission radius. The 
definitions of notations adopted in the formulation are listed below. 



First, the given parameters are shown as follows: 



N 


The set of all sensor nodes 


P 


The set of all candidate paths that connect data source node s to sink node q 


S 


The set of all data source nodes 


h 


Longest distance of shortest path to reach the farthest data source node 


M 


An arbitrary large number 


^p(n,k) 


The indicator function, which is 1 if the link from node n to node k is on path p, and 0 
otherwise 


d„k 


Euclidean distance between node n and node k 


tdata 


Transmission time for transmitting a data packet 


RTS 


Transmission time for RTS frame 


SIFS 


Short inter-frame space time 


6 


Maximum propagation delay for transmitting data packet 


Q 


The sink node 




The set of all possible transmission radii that node n can adopt, which is a discrete set 




Energy consumption function of node n per unit time, which is a function of the sensor's 
transmission radius 


T 


The largest number of retransmission times 
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1 if data source node s uses path p to reach sink node q, and 0 otherwise 


y(n,k) 


1 if the link from node n to node k is on the tree, and 0 otherwise 


r„ 


Transmission radius of the node n 




1 if node k is covered within transmission radius of node n, and 0 otherwise 


Cnk 


Retransmission times of node n to transmit data to node k 



Please note that we do not have to generate all candidate paths that connect data source node s to 
sink node q (i.e., Psq). Section 4 will explain by using Lagrangean multipliers as the link arc weight (in 
Subproblem 2), Xsp will be associated with the minimum-weight path by using the shortest path 
algorithm for each data source node s. 

An analysis of retransmission times is conducted as follows. First, it assumes that each sensor node 
is equipped with a CSMA/CA compatible transceiver, each transmission conforms to a Geometric 
distribution, and each sensor node generates data packets that follow a Poisson distribution at a certain 
rate of X, Successful transmission of data from sender to receiver depends on the number of senders 
whose transmission radius covers the receiver. By considering receiver side collisions, in terms of 
communication radii of sensor nodes, the hidden-node problem is implicitly contemplated. In the 
CSMA/CA protocol, when a sender wants to transmit a packet to a receiver, it will first issue an RTS 
control frame and wait for a CTS frame from the receiver to ensure that the channel be free [4]. 
According to the CSMA/CA protocol, the time interval between RTS and CTS is no larger than a short 
inter-frame spacing ( SIFS ) time. Let the propagation delay from sender to receiver be 6, and 
turnaround time be 29. The overall contention period is ( RTS + SIFS + 26). Then the average 
retransmission time from node n to node k (i.e., Cnk) is as follows: 

Average Retransmission Times k) = ^ = ^ 

^ ^ ' ^ ^ -MRTS+SIFS+20)Z^j, ' (0) 

y succes^n,k) ^ jeN 

y . z ; calculates the total number of senders whose transmission radius covers node k. The 

^ jgN jk 

meaning of (0) is the mean value of the Geometric distribution, where the successful transmission 
probability, say Psuccess, is that no data transmissions are occurring at any node whose transmission 
radius covers receiver node k within the contention period (RTS + SIFS +20). 

The MAC-DAR problem in WSNs is then formulated as the following nonlinear optimization 
problem (IP). 

Zip = min zf^... +(^- Z^..)1-^.(^J (IP) 

kGN 



subject to: 

Z ^ Vn,ksN,ssS (1) 

XZ7w)>max{/,,|S|} (2) 

nsNksN 
sgS psP^g 

Z^,, =1 ^^^^ (4) 
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y V. ; ^ < 1 

y{n,k) - ^ 

ksN 




(5) 


IVl 


yn,kGN 


(6) 


z„ud, < r 

nk nk n 


\/n,kGN 


(7) 


y(n,k) ~ ^ nk 


\/n,kGN 


(8) 


-^-y[nM))M 

c > 

^nk — -X{RTS+SIFS+2e)Y,Zjk 

ieN 

e ' 


\/n,kGN 


(9) 


X = 0 or 1 




(10) 


yin,k) - ^ or 1 


It czAT 




^nk = 0 or 1 




(12) 






(13) 






(14) 


G {0,1,2,...., T} 


Vn,A:GA^. 


(15) 



The objective function of (IP) is to minimize total energy consumption, where ^t^^t^-e^irj 

nsN 

captures the energy consumption from data transmission; and yi RTS ' T^c^^ ye^jr^) captures the 

energy consumption from retransmission. In conjunction with the objective function, three sets of 
constraints {data aggregation tree, transmission coverage, and retransmission times) are enforced in 
Problem (IP) to give the MAC-DAR problem. 

A. Data aggregation tree constraint 

The basic idea of this set of Constraints, are to ensure that the union of all routing paths, from data 
source nodes to sink shall be a data aggregation tree. Recall that a data aggregation tree is a reverse- 
multicast tree, which is a multicast tree rooted at the sink node, but with opposite transmission 
directions. The data aggregation tree properties are enforced by Constraints (1) to (5). Constraint (1) 
requires that if path p is selected for source node s to reach sink node q, the path must be on the tree. 
This constraint also enforces that if links (n, k) on path p are adopted by source node s to reach the sink 
node, then y(n,k) must be 1. Constraints (2) and (11) require that the total number of links on an 
aggregation tree is at least the maximum of h and the cardinality of S, Both h and l^l are legitimate 
lower bounds of the total number of links on an aggregation tree, and they could be calculated in 
advance [3]. According to [3], introducing Constraint (2) will significantly improve solution quality. 
The left-hand term of Constraint (3) calculates the number of paths that are destined for the sink node, 
and pass through link {n, k) on the aggregation tree. The right-hand term of Constraint (3) is at most 
\S\, When the union of paths destined for a sink node contains a cycle, and this cycle contains link {n, 
k), then Constraint (3) would not be satisfied because there would be too many paths passing through 
this link. In other words. Constraint (3) enforces the union of paths that do not contain a cycle [6]. 
Constraints (4) and (10) require that any data source adopts only one routing path destined for the sink 
node. Constraint (5) is the outgoing link constraint. All intermediate nodes on the aggregation tree 
should have only one outgoing link. For example, in Figure 1, each node on the data aggregation tree 



Sensors 2009, 9 



1525 



has only one outgoing link to the sink node. In summary, Constraints (l)-(5), (10), and (12) enforce 
that the union of all routing paths shall be a data aggregation tree. 



B. Transmission coverage constraint 



The basic idea of this set of constraints are to ensure that if a node k is covered within the 
transmission radius of node n, then the distance between node n and node k must be smaller than the 
transmission radius of node n. Because M is a very large number, on the left hand side of Constraint 

(6), if > d^j^ (i.e., node k is within the transmission radius of node n), then o < < i . This will 

M 

force Znk to be equal to 1. On the other hand, if < d^^ then ~ ^nk < q ^ then, Znk could be equal to 0. 

M 

Constraint (7) enforces that if node k is covered within the transmission radius of the node n, then the 
transmission radius of node n must be larger than the distance between nodes n and k. Hence, 
Constraints (6) and (7) specify the transmission coverage constraints for decision variables and . 

Then Y^j^N^jk •> which is used in Equation (0), calculates the total number of senders whose 

transmission radius covers the node k. Constraint (8) relates decision variable y(n,k) to Znh When y(n,k) 
equals to 1, it will force Znk to be 1. 

Constraint (13) restricts that the set of possible transmission radii that node n can adopt is a discrete 
and finite set. Constraint (14) ensures that each data source node turns on its transmission radius. The 
transmission radius of each source node cannot be 0. 



C. Retransmission time constraints 



The basic idea of this set of constraints are to calculate the retransmission times of node n to 

transmit data to node k, where the retransmission times are determined by the total number of nodes on 

the data aggregation tree, whose transmission radius covers node k. Constraint (9) calculates the 

retransmission times of node n to transmit data to node k. Since only the sensor nodes on the 

aggregation tree need to calculate retransmission times, when = 1, the right side of Constraint (9) 
is the same as Equation (0), i.e., to enforce the retransmission times (i.e., ) and should be at least 

the average retransmission times. When = 0, the right side of Constraint (9) is zero, it implies that 
there is no retransmission time constraint. Constraint (15) is an integer constraint of retransmission 
times. 

In order to make the problem (IP) tractable, a natural logarithm is used on both sides of Constraint 
(9) for applying the Lagrangean relaxation schemes, 

ln(c.,)>ln 



-^RTS+SIFS+20)^Zji^ 



HcJ >AiRTS+SIFS+20)Zzj,-M + My^^^,^ 

j'gN 



Hence, Constraint (9) becomes 

\n(c„,)>A(RTS+SIFS+20)Zzj,-M + My^„j^^ yn,keN (9) 
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4. Lagrangean Relaxation 



The algorithm structure is based upon Lagrangean relaxation. In (IP), by introducing Lagrangean 
multiplier vectors u\ u, w^, u^, u\ and u\ Constraints (1), (3), (6), (7), (8), and (9) are dualized to 
obtain the following Lagrangean relaxation problem (LR). 

Zd = 



+ 



" nsNksN nsNksN tigN k^N \ JgN 



HGNkGN 

subject to: 



(LR) 



Z llyin,k) >max{/z,|S|} 

HGNkGN 

Z yin,k) ^ 1 

= 0 or 1 

y(n.ic) = 0 or 1 

= 0 or 1 
r„ € i?„ 
r„ ^0 

c„, €{0,1, 2,...., r} 



Vn,kBN 



(16) 
(17) 

(18) 

(19) 
(20) 
(21) 
(22) 
(23) 
(24) 



(LR) can be decomposed into four independent subproblems. 



Subproblem 1 : for 



(SUBl) 



subject to (16), (18) and (20). 
Subproblem 2: for x^^ 



n&N k&N s&S p^Psq 



subject to (17) and (19). 
Subproblem 3: for r„ and c„^ 



(SUB2) 



(SUB3) 



subject to (22), (23) and (24). 
Subproblem 4: for z„j^ 



"^i"ZZ +^(^+cra+2^Z" 



6 



subject to (21). 



(SUB4) 
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The proposed algorithm for solving (SUBl) is described as follows. 

Stepl.For each link {n,k) compute the coefficient ulj^ + ulj^M - ulj^ \S\-Y.Kk ^^^^ 

y(n,k)' 

Step2. For all outgoing links of node n, find the smallest coefficient. If the smallest 
coefficient is negative, then set the corresponding y(n,k) as 1, and the other outgoing links 
y(n,k) as 0, otherwise set all outgoing links j^^^^y^; as 0. Repeat step 2 for all nodes. 
Step3. If the total number of y(n,k), whose value is 1 (denoted as r) are smaller than max{h, 
\S\}, then first let each y(n,k) whose corresponding coefficient is negative be 1. Second, 
assign the (max{h, \S\}-t) number of y(n,k) to be 1 whose corresponding coefficients are 
the smallest positive values. Third, let the remaining j^^j^; be 0. 

The computational complexity of above algorithm is 0(\N\ ). 

(SUB2) can be further decomposed into 15*1 independent shortest path problems with nonnegative 
arc weight whose value is ^^^^ + . For each shortest path problem, it can be effectively solved by 

Dijkstra's algorithm. The computational complexity of Dijkstra's algorithm is 0(\N\ ) for each data 
source node. 

(SUB3) can be optimally solved by exhaustively searching the combination of radius r„ and Cnh The 
computational complexity of (SUBS) is 0(\R^\T) for each node n. 

In (SUB4), if the corresponding coefficient _ ^^^m - ul + A(RTS + CTS + 2^)^ of link (n, k) 

is negative then set Znk to be 1, otherwise 0. The computational complexity of (SUB4) is for each 
link (n, k). 

According to the algorithms proposed above, the Lagrangean relaxation problem can be effectively 
and optimally solved. Based on the weak Lagrangean duality theorem, Zd(u ,u ,u\u^) is a lower 
bound of Z/p. The tightest lower bound is calculated by using the subgradient method [1]. 

5. Obtaining Primal Feasible Solutions 

It is noted that solutions to the problem (LR) may not be feasible for the primal problem (IP), 
because six constraints are relaxed to the objective function. This paper proposes an optimization- 
based integrated primal feasible algorithm, called LGR-Primal, which jointly address data aggregation 
and retransmission to obtain primal feasible solutions to the problem (IP). The information in problem 
(LR) provides useful information to obtain good primal feasible solutions. In LGR-Primal, the 
information from the Lagrangean relaxation (the solutions to the dual problem and the Lagrangean 
multipliers) is used to optimize the tradeoff between data aggregation and retransmission. 

LGR-Primal is presented in Algorithm 1; it identifies the routing path (i.e., Xsp) for each data source 
node, and then the data aggregation tree is obtained by unifying all the routing paths from each data 
source node to the sink. In order to obtain an energy efficient data aggregation tree, the link arc weight 
assignment optimizes the tradeoff between data aggregation and retransmission. 
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When the routing path Xsp of each data source node s, which is returning to the sink node, is 

determined, then the selected links (i.e., y(n,k)) on the data aggregation tree could also be determined. In 

addition, the transmission radius (i.e., r^) of each node could also be determined to cover the 

termination node for all selected links on the data aggregation tree. After the transmission radius of 

each node is determined, then the coverage decision variable Znk could also be determined. Finally, the 

value of retransmission time Cnk could also be determined to satisfy Constraint (9). 

In Step 1 of Algorithm 1, the first term of the arc weight assignment is energy consumption for link 
{n, k), e^j^id^j^), which captures the minimum transmission power to select link (n, k) on the data 

aggregation tree. It is worth noting that, the physical meaning of the Lagrangean multiplier is the 
violating cost of the associated constraint. The second term, , captures the penalty cost for violating 

the tree constraint. The third and forth terms, and , capture penalty costs for violating 

transmission coverage constraints. The fifth and sixth terms, wf^ and , capture the penalty costs for 

violating retransmission time constraints. Hence, based on this link arc weight assignment, this paper 
attempts to identify the minimum transmission power data aggregation tree by considering the extra 
costs from retransmissions. 

The computational complexity of Algorithm 1 at Step 2 is 0{\S\\N\^). At Steps 3 and 4, when 

computing the retransmission power of the objective function, it requires calculation of the total 
number of other nodes, whose transmission radius covers node k (Equation (0)), in order to determine 
the retransmission times Cnk from node n to node k. Therefore, the computational complexity is 
0{\N f) . Hence, the computational complexity of the LGR-Primal algorithm should be Oi\N f) . 

However, if each sensor node k is equipped with GPS, which enables it to know their neighboring 
sensors nodes, then to determine the retransmission times of Cnk from node n to node k would only 
need to calculate the ^'s neighboring nodes (instead of all the other sensor nodes in the WSN) whose 
transmission radius covers node k. In this case, the computational complexity for Steps 3 and 4 would 
only be 0(| A^|^) . Then the computational complexity of the LGR-Primal algorithm should be 

0{\S \\N , This makes this algorithm scalable to a large scale WSN. 

Algorithm 1. LGR-Primal Algorithm. 

Step 1) Assign the arc weight of the each link {n, k) e^,{d^,)+u I, + u], + ul, +u I + ul . 

Step 2) Perform a Dijkstra's shortest path algorithm to identify the routing path (i.e., Xsp) from each data 
source node s to the sink node. 

Step 3) Determine the other decision variables (y(n,k), J^n, ^nk and Cnk) without violating the associated 
constraints. 

Step 4) Calculate the objective value of the problem (IP). 

The following section will show a complete algorithm (denoted LGR), as based on subgradient 
method [1] for solving problem (IP). The computational complexity of the LGR is 0(|aS||A^|^ + |A^||7^„|r) . 

Algorithm 2. LGR Algorithm. 

Begin 

Input Network topology, data source nodes 
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Output: Data aggregation tree 

Initialize Lagrangean multiplier vectors (o) = 0, V/ = 1,....,6. 

UB = 00 and LB = -oo (upper and lower bounds, respectively). 

quiescence _age = 0, and step_size = 2. 

For iteration = 1 to Max_Iteration, perform the following: 

Solve subproblem 1, subproblem 2, subproblem 3, subproblem 4. 

Compute Zjy in (LR). 

If Zj,{u) >LB 

LB = Zj^iu) and quiescence age = 0. 
Else quiescence age = quiescence age + 1. 
If quiescence age = Quiescence Threshold 

step size = step_size/2 and quiescence age = 0. 
Run Algorithm 1 (LGR-Primal). Compute the new upper bound ub. 
If ub < UB then UB = ub. 
Update the step size. 

Update the Lagrangean multiplier vectors. 
End For 
End 



6. Computational Experiments 

The proposed algorithms for MAC-DAR problems are coded in C and run on a PC with PIV-2G. In 
a LGR algorithm, Max_Iteration and quiescence age are set to 2000 and 30, respectively. The step size 
coefficient step_size, is initialized as 2, and is halved when the objective function value of the dual 
problem is not improved by iterations reaching quiescence age. The computational times for the 
following experiments are all within five minutes. 

The network topology comprises \N\ i= 150 in Figure 3 and 4, up-to 250 in Figure 5) sensor nodes 

randomly placed within a 1 x 1 square unit area. The cost of the energy consumption function (in 
milliwatts), e^{r^) , is defined as the square of 100 x Euclidean distance multiplied by the energy 

consumption per millisecond when the sensor node is transmitting data. The set of all possible 
transmission radii of sensor node n (i.e., Rn) are configured to begin from 0 to the maximum 
communication radius (e.g. 0.25 in Figure 3) with step size 0.01. The CSMA/CA related parameters 
( RTS , SIFS, 6) are the same settings as in [4]. To evaluate the solution quality of our proposed 
algorithm, four existing algorithms are implemented for comparison. The SPT, GIT, and CNS 
algorithms are proposed in [2], and the forth algorithm CCA, is described in [5]. It is worth noting that, 
all four heuristic constructs data aggregation trees without considering MAC layer collision effects. 
Each plotted point in Figures 3-5 is a mean value over 10 simulation results. 

Two different models in WSN are simulated. The first is an event-driven, where neighboring sensor 
nodes of the event will become the data source nodes. The second is a random-source, where data 
source nodes are determined in random. Hence, the data source nodes in an event-driven model will be 
closer to each other than in a random-source model. 
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Figure 3 shows the total energy consumption with increasing numbers of data source nodes. When 
the number of source nodes is large, the aggregation tree is larger. It is shown that the LGR algorithm 
can obtain the best solution quality, as compared to other heuristics in both event-driven and random- 
source models. In addition, in an event-driven model, the solution quality of LGR algorithm is even 
more significant than other heuristics. This is because in an event-driven model, the data source nodes 
are clustered to increase probability of collisions. Hence, heuristic algorithms that do not address the 
MAC collision suffer from severe retransmission occurrences. 

In Figure 4, the effects of communication radii on energy consumption are examined with 90 data 
source nodes. As shown, the LGR algorithm can still obtain best solution quality, as compared to the 
other heuristics on both models. Interestingly, even though a large maximum communication radius 
could increase the probability of data aggregation, it is shown that a large maximum communication 
radius did not offer any advantage for MAC aware energy efficient data aggregation trees, because a 
large communication radius leads to severe collisions that could jeopardize the advantages of data 
aggregation. In addition, in an event-driven model, the best solution for the LGR algorithm is at 0.18, 
instead of at 0.16 communication radius. This reveals that; for too small communication radii, even 
though the collision probability is low, it will not provide data aggregation advantages that save total 
transmission power. Hence, best communication radii setting should consider the tradeoff between 
collision and data aggregation. 



Figure 3. Total Energy Consumption with respect to no. of data source nodes. 
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Figure 4. Total Energy Consumption with respect to Transmission Radius. 
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Figure 5. Total Energy Consumption with respect to Network Size. 
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Figure 5 depicts the experiments evaluating the solution quality under different network sizes (i.e., 
network density). The LGR algorithm outperforms the other heuristics for all the network sizes. In 
large network size (i.e., a high density of sensor nodes within a fixed deployment area), the solution 
quality of LGR over the other heuristics is more significant. Recall that in the first term (i.e., 
^^^Hhata'^n^^n)^ objectivc fuuctlou in problem (IP), which favors communication links of 

shorter distances, thus, the data aggregation tree will be composed of more relay nodes for data 
aggregation. However, in this case, as too many relay nodes on the data aggregation tree will introduce 
higher probability of collision, we will have larger retransmission cost (i.e., the second term of the 
objective function in problem (IP)). It is expected that the second term will play a more important role 
in a dense network topology. According to Figure 5, the solution for LGR increase more mildly than 
the other heuristics in a dense network topology, which reveals that LGR will not select a large 
number of hops for data aggregation, in order to avoid extra energy loss from retransmissions in a 
dense network topology. 

The improvement ratio is defined as (other approach — LGR)/(LGR)xlOO% to show the solution 
quality. In Table 1, the improvement ratio of LGR over SPT, GIT, CNS, and CCA is 57%, 42%, 29%, 
and 59%, respectively. 



Table 1. Improvement Ratio. 



Improvement Ratio 


Figure 3 


Figure 4 


Figure 5 


SPT 


(53%, 43%) 


(57%, 33%) 


(49%, 45%) 


GIT 


(40%, 30%) 


(42%, 29%) 


(42%, 22%) 


CNS 


(29%, 21%) 


(14%, 6%) 


(27%, 27%) 


CCA 


(59%, 24%) 


(31%, 24%) 


(37%, 35%) 



7. Conclusions 

In addition to the data aggregation, retransmission energy loss due to MAC collision plays another 
major crucial factor for energy-efficient data-aggregation routing in WSNs. This paper is the first one 
proposing a novel nonlinear mathematical formulation for MAC aware energy efficient data 
aggregation routing problems in WSN, where the objective function is to minimize the total energy 



Sensors 2009, 9 



1532 



consumption (including data transmission power and retransmission power) subject to data 
aggregation trees, transmission coverage, and data retransmissions constraints. The proposed solution 
approach is based on Lagrangean relaxation to construct a MAC aware energy-efficient data 
aggregation tree that jointly considers the tradeoff between data aggregation and data retransmission 
by using the Lagrangean multipliers. According to the computational experiments, the proposed LGR 
algorithm outperforms other heuristics under all tested cases, especially in an event-driven model. This 
is because in an event-driven model, the data source nodes are clustered, and thus, extra energy loss 
from retransmissions will be more significant. This indicates that a good data aggregation algorithm 
should be a cross layer algorithm that jointly addresses data aggregation in the network layer, and the 
retransmission energy loss in the MAC layer. 
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